Contention-free Complete Exchange Algorithm on Clusters
نویسندگان
چکیده
To construct a large commodity cluster, a hierarchical network is generally adopted for connecting the host machines, where a Gigabit backbone switch connects a few commodity switches with uplinks to achieve scaled bisectional bandwidth. This type of interconnection usually results in link contention and has congestion developed at the uplink ports. Moreover, the non-deterministic delays on scheduling communication events in clusters accelerate the building up of congestion amongst these uplink ports, which lead to severe packets drop and hinder the overall performance. In this paper, we focus on the practical design of high-speed complete exchange algorithm on a commodity cluster interconnected by a hierarchical Ethernet-based network. With the use of some architectural characteristics in optimizing the performance of a complete exchange algorithm, we introduce a congestion control mechanism global windowing that monitors and regulates the traffic load, together with a permutation scheme reorder scheme that effectively alleviates the congestion problem. We evaluate the modified algorithm and compare its performance with the original algorithm and a well-known algorithm in a PC cluster connected by various types of switches, including Gigabit Ethernet, input-buffered and shared-memory Fast Ethernet switches.
منابع مشابه
Efficient Scheduling of Complete Exchange on Clusters
One of the performance limitations of clusters is their message passing capability, while complete exchange is known to be the severest communication pattern on all types of message passing machines. In this paper, we focus on the practical issues of designing high-speed complete exchange algorithms on a commodity cluster interconnected by a non-blocking crossbar switch. Four complete exchange ...
متن کاملcient Scheduling of Complete Exchange on Clusters
In this paper, we focus on the practical issues of designing e cient complete exchange algorithms on a commodity cluster interconnected by a non-blocking crossbar switch. Four complete exchange algorithms, including, shift exchange, pairwise exchange, group shu e exchange and synchronous shu e exchange algorithms are studied and tested on a cluster platform. These algorithms feature their own c...
متن کاملBandwidth optimal all-reduce algorithms for clusters of workstations
We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operation and propose a ring-based algorithm that only requires tree connectivity to achieve bandwidth ...
متن کاملAssessing Contention Effects on MPI_Alltoall Communications
One of the most important collective communication patterns used in scientific applications is the complete exchange, also called All-to-All. Although efficient algorithms have been studied for specific networks, general solutions like those available in well-known MPI distributions (e.g. the MPI_Alltoall operation) are strongly influenced by the congestion of network resources. In this paper w...
متن کاملAssessing contention effects of all-to-all communications on clusters and grids
One of the most important collective communication patterns used in scientific applications is the complete exchange, also called All-to-All. Although efficient algorithms have been studied for specific networks, general solutions like those available in wellknown MPI distributions (e.g. the MPI_Alltoall operation) are strongly influenced by the congestion of network resources. In this paper we...
متن کامل